R3 delta replay picks by samsja · Pull Request #2647 · PrimeIntellect-ai/prime-rl

samsja · 2026-05-27T01:48:54Z

No description provided.

Squashed from origin/r3-delta (tip 5c94833, which extends the earlier 3799bda with 'Support branched routed expert deltas' for cases where the routed-experts payload diverges across siblings in a group). Adapts delta replay to main's deferred routed-experts chunk concat: first step starts at 0; extended steps use prefix_len - 1; row 0 fills the boundary, remaining rows append as the new suffix. Bumps router wheel pin to local-path. Bumps deps/verifiers gitlink to d39cc5876. Adds four debug configs for router-replay validation. Co-Authored-By: S1ro1 <matej.sirovatka@gmail.com>

The first-match-wins loop over active_samples picks the wrong sample when one active prefix is a strict prefix of another. This can happen after a compaction/rollback step whose prompt is shorter than an existing sample's prefix and whose completion re-generates the same tokens and extends past them: the new sample's prefix then starts with the older sample's prefix, and any later step that extends the new sample also satisfies the slice check against the older one. When that happens, extend_sample folds the newer sample's generated tokens into the older sample as user-input tokens (mask=False, logprob=0) and leaves the newer sample stale -- a silent Exact-Prefix invariant violation. Switch to longest-match: strictly more specific, never worse than first-match when only one prefix matches. Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit 0e239d1)

When more than one active prefix matches a step's prompt, log a warning with the example id, step index, set of matching prefix lengths, total active prefixes, and the prompt length. Longest-match still picks the correct extension; the warning just surfaces the rare ambiguous case so it's debuggable if it starts showing up in real rollouts (e.g. from compaction/rollback turns). Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit ca38614)

Add slurm.cleanup_grace_period_seconds (default 3600) so that when a component exits — completion, crash, or SIGTERM — the multi-node RL and inference sbatch teardown sends SIGTERM and then waits up to the grace period for the remaining processes to exit before force-killing and releasing the allocation. This gives in-flight work, notably trainer checkpoint writes, a bounded window to flush. The wait ends as soon as all processes exit, so it is only an upper bound; set to 0 for the previous immediate force-kill behavior. Closes #2664 Co-authored-by: Cursor <cursoragent@cursor.com>

Drop the _seconds suffix; the unit is documented in the field docstring. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

The previous SIGTERM-then-wait approach didn't help the target case (inference dies while the trainer is mid-checkpoint on another node): that teardown is driven by `srun --kill-on-bad-exit=1`, which reaps the trainer task via SLURM's own KillWait path and never runs our in-task grace loop. Instead, on a non-zero exit the failing node now stays alive (signalling nothing) for the grace period before propagating the exit. Because --kill-on-bad-exit only fires when a task exits, holding the failing task keeps peer nodes' checkpointing trainers running untouched until they flush. Clean (zero-exit) completion is unaffected. Scope to multi_node_rl only; the inference-only template has no trainer checkpoints to protect, so it reverts to immediate teardown. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

samsja and others added 4 commits May 24, 2026 19:47

Handle token export mkdir races

5ee18c8

samsja mentioned this pull request May 27, 2026

R3 delta replay picks (no configs) #2648

Closed

samsja and others added 9 commits May 27, 2026 08:19

Log per-server inference metrics

8f8bd91

fix: pin vLLM no-coordinator wheel

84ed291

refactor: rename cleanup_grace_period_seconds to cleanup_grace_period

3a45b7f

Drop the _seconds suffix; the unit is documented in the field docstring. Co-authored-by: Cursor <cursoragent@cursor.com>

chore: drop CHANGELOG entry for cleanup_grace_period

57e7510

Co-authored-by: Cursor <cursoragent@cursor.com>

docs: note cleanup_grace_period is multi-node RL only

6685b3f

Co-authored-by: Cursor <cursoragent@cursor.com>

docs: minimize cleanup_grace_period docstring

d8a184c

Co-authored-by: Cursor <cursoragent@cursor.com>

Feat: retry/logging

ca0745d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

R3 delta replay picks#2647

R3 delta replay picks#2647
samsja wants to merge 13 commits into
mainfrom
r3-delta-replay-picks

samsja commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

samsja commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants